首页> 外文OA文献 >LexRank: Graph-based Lexical Centrality as Salience in Text Summarization
【2h】

LexRank: Graph-based Lexical Centrality as Salience in Text Summarization

机译:LexRank:基于图形的词汇中心性作为文本中的突出   概要

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We introduce a stochastic graph-based method for computing relativeimportance of textual units for Natural Language Processing. We test thetechnique on the problem of Text Summarization (TS). Extractive TS relies onthe concept of sentence salience to identify the most important sentences in adocument or set of documents. Salience is typically defined in terms of thepresence of particular important words or in terms of similarity to a centroidpseudo-sentence. We consider a new approach, LexRank, for computing sentenceimportance based on the concept of eigenvector centrality in a graphrepresentation of sentences. In this model, a connectivity matrix based onintra-sentence cosine similarity is used as the adjacency matrix of the graphrepresentation of sentences. Our system, based on LexRank ranked in first placein more than one task in the recent DUC 2004 evaluation. In this paper wepresent a detailed analysis of our approach and apply it to a larger data setincluding data from earlier DUC evaluations. We discuss several methods tocompute centrality using the similarity graph. The results show thatdegree-based methods (including LexRank) outperform both centroid-based methodsand other systems participating in DUC in most of the cases. Furthermore, theLexRank with threshold method outperforms the other degree-based techniquesincluding continuous LexRank. We also show that our approach is quiteinsensitive to the noise in the data that may result from an imperfect topicalclustering of documents.
机译:我们介绍了一种基于随机图的方法来计算自然语言处理中文本单元的相对重要性。我们测试了有关文本摘要(TS)问题的技术。抽取式TS依靠句子显着性的概念来识别文档或文档集中最重要的句子。显着性通常根据特定重要单词的存在或与质心伪句子的相似性来定义。我们考虑一种新的方法LexRank,该方法基于句子图形表示中的特征向量中心性概念来计算句子重要性。在该模型中,基于句内余弦相似度的连通性矩阵被用作句子图形表示的邻接矩阵。在最近的DUC 2004评估中,我们基于LexRank的系统在多项任务中均排名第一。在本文中,我们将对我们的方法进行详细分析,并将其应用于更大的数据集,包括来自早期DUC评估的数据。我们讨论了几种使用相似度图计算中心度的方法。结果表明,在大多数情况下,基于学位的方法(包括LexRank)优于基于质心的方法和其他参与DUC的系统。此外,具有阈值的LexRank方法优于包括连续LexRank在内的其他基于学位的技术。我们还表明,我们的方法对由不完善的文档局部聚类可能导致的数据噪声非常不敏感。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号